13 research outputs found
Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data sets,
but recent studies have shown that certain computations---in particular, many
linear algebra computations that are the basis for solving common machine
learning problems---are significantly slower in Spark than when done using
libraries written in a high-performance computing framework such as the
Message-Passing Interface (MPI).
To remedy this, we introduce Alchemist, a system designed to call MPI-based
libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear
algebra, machine learning, and related computations, while still retaining the
benefits of working within the Spark environment. We discuss the motivation
behind the development of Alchemist, and we provide a brief overview of its
design and implementation.
We also compare the performances of pure Spark implementations with those of
Spark implementations that leverage MPI-based codes via Alchemist. To do so, we
use data science case studies: a large-scale application of the conjugate
gradient method to solve very large linear systems arising in a speech
classification problem, where we see an improvement of an order of magnitude;
and the truncated singular value decomposition (SVD) of a 400GB
three-dimensional ocean temperature data set, where we see a speedup of up to
7.9x. We also illustrate that the truncated SVD computation is easily scalable
to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, London, UK,
201
Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data sets,
but recent studies have shown that certain computations---in particular, many
linear algebra computations that are the basis for solving common machine
learning problems---are significantly slower in Spark than when done using
libraries written in a high-performance computing framework such as the
Message-Passing Interface (MPI).
To remedy this, we introduce Alchemist, a system designed to call MPI-based
libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear
algebra, machine learning, and related computations, while still retaining the
benefits of working within the Spark environment. We discuss the motivation
behind the development of Alchemist, and we provide a brief overview of its
design and implementation.
We also compare the performances of pure Spark implementations with those of
Spark implementations that leverage MPI-based codes via Alchemist. To do so, we
use data science case studies: a large-scale application of the conjugate
gradient method to solve very large linear systems arising in a speech
classification problem, where we see an improvement of an order of magnitude;
and the truncated singular value decomposition (SVD) of a 400GB
three-dimensional ocean temperature data set, where we see a speedup of up to
7.9x. We also illustrate that the truncated SVD computation is easily scalable
to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, London, UK,
201
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
We explore the trade-offs of performing linear algebra using Apache Spark,
compared to traditional C and MPI implementations on HPC platforms. Spark is
designed for data analytics on cluster computing platforms with access to local
disks and is optimized for data-parallel tasks. We examine three widely-used
and important matrix factorizations: NMF (for physical plausability), PCA (for
its ubiquity) and CX (for data interpretability). We apply these methods to
TB-sized problems in particle physics, climate modeling and bioimaging. The
data matrices are tall-and-skinny which enable the algorithms to map
conveniently into Spark's data-parallel model. We perform scaling experiments
on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide
tuning guidance to obtain high performance
Inputindependent, scalable and fast string matching on the Cray XMT
String searching is at the core of many security and network applications like search engines, intrusion detection systems, virus scanners and spam filters. The growing size of on-line content and the increasing wire speeds push the need for fast, and often realtime, string searching solutions. For these conditions, many software implementations (if not all) targeting conventional cache-based microprocessors do not perform well. They either exhibit overall low performance or exhibit highly variable performance depending on the types of inputs. For this reason, real-time state of the art solutions rely on the use of either custom hardware or Field-Programmable Gate Arrays (FPGAs) at the expense of overall system flexibility and programmability. This paper presents a software based implementation of the Aho-Corasick string searching algorithm on the Cray XMT multithreaded shared memory machine. Our solution relies on the particular features of the XMT architecture and on several algorithmic strategies: it is fast, scalable and its performance is virtually content-independent. On a 128-processor Cray XMT, it reaches a scanning speed of ≈ 28 Gbps with a performance variability below 10 %. In the 10 Gbps performance range, variability is below 2.5%. By comparison, an Intel dual-socket, 8-core system running at 2.66 GHz achieves a peak performance which varies from 500 Mbps to 10 Gbps depending on the type of input and dictionary size.